Important and new features with analysis for disfluency interruption point (IP) detection in spontaneous Mandarin speech
نویسندگان
چکیده
This paper presents a whole set of new features, some duration-related and some pitch-related, to be used in disfluency interruption point (IP) detection for spontaneous Mandarin speech, considering the special linguistic characteristics of Mandarin Chinese. Decision tree is incorporated into the maximum entropy model to perform the IP detection. By examining performance degradation when each specific feature was missing from the whole set, the most important features for IP detection for each disfluency type were analyzed in detail. The experiments were conducted on the Mandarin Conversational Dialogue Corpus (MCDC) developed by the Institute of Linguistics of Academia Sinica in Taiwan.
منابع مشابه
Improved spontaneous Mandarin speech recognition by disfluency interruption point (IP) detection using prosodic features
In this paper, a new approach for improved spontaneous Mandarin speech recognition with disfluencies well considered is presented. The basic idea is to detect the disfluency interruption points (IPs) prior to the recognition, and then to use these information during rescoring in the recognition process. For accurate detection of disfluency interruption points (IPs), a whole set of new features ...
متن کاملSpontaneous Mandarin Speech Recognition with Disfluencies Detected by Latent Prosodic Modeling (LPM)
In this paper, a new approach for improved spontaneous Mandarin speech recognition using Latent Prosodic Modeling (LPM) for disfluency interruption point (IP) detection is presented. The basic idea is to detect the disfluency interruption points (IPs) prior to the recognition, and then to incorporate these information into the recognition process via the second pass rescoring. For accurate dete...
متن کاملLatent prosodic modeling (LPM) for speech with applications in recognizing spontaneous Mandarin speech with disfluencies
In this paper, a new approach of Latent Prosodic Modeling (LPM) for analyzing the prosody of speech is presented. Based on a set of newly defined prosodic characters, prosodic terms, documents, and the Probabilistic Latent Semantic Analysis (PLSA) framework, prosody can be modeled using a set of prosodic states representing various latent factors such as speakers, speaking rate, utterance modal...
متن کاملContextual Maximum Entropy Model for Edit Disfluency Detection of Spontaneous Speech
This study describes an approach to edit disfluency detection based on maximum entropy (ME) using contextual features for rich transcription of spontaneous speech. The contextual features contain word-level, chunk-level and sentence-level features for edit disfluency modeling. Due to the problem of data sparsity, word-level features are determined according to the taxonomy of the primary featur...
متن کاملApplication of Structural Events Detected on ASR Outputs for Automated Speaking Assessment
We investigated features reflecting utterance structure and disfluency profile to improve the automated scoring of spontaneous speech responses by non-native speakers of English. Features derived from structural events (SEs), e.g., clause structure and disfluencies, showed promisingly high correlations to the human proficiency scores. However, previous studies were based on speech transcription...
متن کامل